North Atlantic Ocean
Exemplar-Free Continual Learning for State Space Models
Lee, Isaac Ning, Mahmoodi, Leila, Le, Trung, Harandi, Mehrtash
State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence, making them well-suited for sequence modeling. However, their evolving internal states pose challenges in adapting them under Continual Learning (CL). This is particularly difficult in exemplar-free settings, where the absence of prior data leaves updates to the dynamic SSM states unconstrained, resulting in catastrophic forgetting. To address this, we propose Inf-SSM, a novel and simple geometry-aware regularization method that utilizes the geometry of the infinite-dimensional Grassmannian to constrain state evolution during CL. Unlike classical continual learning methods that constrain weight updates, Inf-SSM regularizes the infinite-horizon evolution of SSMs encoded in their extended observability subspace. We show that enforcing this regularization requires solving a matrix equation known as the Sylvester equation, which typically incurs $\mathcal{O}(n^3)$ complexity. We develop a $\mathcal{O}(n^2)$ solution by exploiting the structure and properties of SSMs. This leads to an efficient regularization mechanism that can be seamlessly integrated into existing CL methods. Comprehensive experiments on challenging benchmarks, including ImageNet-R and Caltech-256, demonstrate a significant reduction in forgetting while improving accuracy across sequential tasks.
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
Lee, Yoonho, Chen, Annie S., Tajwar, Fahim, Kumar, Ananya, Yao, Huaxiu, Liang, Percy, Finn, Chelsea
In the training data, 95 % of the waterbirds appear on water backgrounds, and 95% of the landbirds appear on land backgrounds, so the minority groups contain far fewer examples than the majority groups. We tune on 400 images from the target distribution, evenly split between the 4 groups of (bird, background) pairs, giving 100 images per group. CelebA (Sagawa et al., 2019): The task is to classify the hair color in images as "blond" or "not blond", and the label is spuriously correlated with the Male attribute. The source distribution is the training set while the target distribution is a balanced subset with equal amounts of each of the four (hair color, gender) groups. We tune on 400 images from the target distribution, evenly split between the 4 groups of (hair color, gender) pairs, giving 100 images per group. Camelyon17 (Bandi et al., 2018): This dataset is part of the WILDS (Koh et al., 2021) datasets and contains roughly 450,000 images in the source distribution (Train) and 84,000 images in the target distribution (OOD test) of size 96 96. It comprises of medical images collected from 5 hospitals where difference in devices/data-processing between different hospitals produces a natural distribution shift.
Searching for Interaction Functions in Collaborative Filtering
Yao, Quanming, Chen, Xiangning, Kwok, James, Li, Yong
Interaction function (IFC), which captures interactions among items and users, is of great importance in collaborative filtering (CF). The inner product is the most popular IFC due to its success in low-rank matrix factorization. However, interactions in real-world applications can be highly complex. Many other operations (such as plus and concatenation) have also been proposed, and can possibly offer better performance than the inner product. In this paper, motivated by the success of automated machine learning, we propose to search for proper interaction functions (SIF) for CF tasks. We first design an expressive search space for SIF by reviewing and generalizing existing CF approaches. We then propose to represent the search space as a structured multi-layer perceptron, and design a stochastic gradient descent algorithm which can simultaneously update both architectures and learning parameters. Experimental results demonstrate that the proposed method can be much more efficient than popular AutoML approaches, and also obtain much better prediction performance than state-of-the-art CF approaches.